CoReMo 2.3 Plagiarism Detector Text Alignment Module - Notebook for PAN at CLEF 2014
نویسندگان
چکیده
In this paper, the basics of the three tuning approaches of the evolving CoReMo Plagiarism Detector are shown, focused for the Text Alignment task. In the last PAN edition, it was observed that the different corpora could condition the necessary tuning, and the results using an overfitted tuning from a different corpus could be far from the expected ones. This year's goal has been to find the way to get the system could be selftuned, looking for improving the performance of any fixed parameter tuning, and to be very closed to the overfitted performance for any corpus. All of these tuning approaches have a high Plagdet performance for any corpus, but it’s intended to show the different advances effect on each corpus and for all the years corpora. They include new features for parameters selftuning, based on the size and the ratio of the compared documents. For the competition, our choice was based on the most constant detection quality tuning approach when any condition (called WideTuning).
منابع مشابه
Text Alignment Module in CoReMo 2.1 Plagiarism Detector Notebook for PAN at CLEF 2013
This paper describes the process and basics of the Text Alignment Module into the CoReMo 2.1 Plagiarism Detector, which has won the Plagiarism Detection Text Alignment task in PAN-2013 edition, for both evaluation criteria of efficacy and efficiency, achieving the best detections and the best runtime too. Its high detection efficacy is mainly due to the special features of the contextual n-gram...
متن کاملCrosslingual CoReMo System (Contextual Reference Monotony) - Notebook for PAN at CLEF 2011
This paper shows an extended version of external CoReMo System (Contextual Reference Monotony, ranked 6th in PAN2010), now with crosslingual capability (ranked 5th in PAN2011 / Plagdet 0,2340). It's not the best ranked system for translated plagiarism (ranked 3th / Plagdet 0,3587), but it has high reliability and speed (global results in 30 minutes), low computer requirements and its own intern...
متن کاملDeveloping Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015
The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...
متن کاملPAN 2015 Shared Task on Plagiarism Detection: Evaluation of Corpora for Text Alignment: Notebook for PAN at CLEF 2015
In this paper we describe and evaluate the corpora submitted to the PAN 2015 shared task on plagiarism detection for text alignment. We received monoand cross-language corpora in the following languages (pairs): English, Persian, Chinese, and Urdu-English, English-Persian. We present an independent section for each submitted corpus including statistics, discussion of the obfuscation techniques ...
متن کاملDeveloping Bilingual Plagiarism Detection Corpus Using Sentence Aligned Parallel Corpus: Notebook for PAN at CLEF 2015
Plagiarism detection is the process of locating text reuse within a suspicious document. The plagiarism detection corpora are used for evaluating plagiarism detection systems. In this paper, we present a bilingual PersianEnglish plagiarism detection corpus. We provide our corpus for the task of text alignment corpus construction in the PAN 2015 competition. Our approach is based on parallel cor...
متن کامل